Part 1: Data Loading and Cleaning

1.1 Background

The raw dataset, restaurants.csv, starts with 10018 observations across 34 attributes, encompassing a broad range of restaurant information. In its initial state, the data presents several challenges requiring immediate preparation: critical analytical variables such as rating and userRatingCount contain missing values, necessitating filtering or imputation to maintain data integrity for quality assessment.

Furthermore, many of the 34 columns, particularly the numerous service and amenity indicators (e.g., delivery, servesCocktails), are stored inefficiently as generic object (string) types instead of proper Booleans or numerics, alongside price information which also requires type conversion. Additionally, columns like id, name, formattedAddress, and geographical coordinates, while useful for identification, must be removed or handled separately before conducting core statistical analysis and exploratory data visualization.

1.2: Data Attributes

  • rating: The primary target variable for prediction/analysis.

  • userRatingCount: Critical feature indicating popularity/credibility.

  • primaryType: The category of restaurant (e.g., Mexican, Thai). Essential for segmentation.

  • businessStatus: Crucial to filter out closed/temporarily closed restaurants if analyzing operational performance.

  • priceStartUSD, priceEndUSD: Important for price segmentation, but they require cleaning.

  • Service & Amenity Booleans: (takeout, delivery, servesDinner, servesWine, servesCocktails, wheelchairAccessible… etc.): These may be valuable for feature engineering and market segmentation.

# Load the dataset CSV
restuarant_data = read_csv("data/restaurants.csv")
## Rows: 10018 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): id, name, formattedAddress, phone, businessStatus, primaryType, go...
## dbl  (6): rating, userRatingCount, latitude, longitude, priceStartUSD, price...
## lgl (21): takeout, delivery, dineIn, curbsidePickup, reservable, servesLunch...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cat("---Number of Obervations and Attributes---\n")
## ---Number of Obervations and Attributes---
cat("Number of Observations (# of rows): " , nrow(restuarant_data), "\n")
## Number of Observations (# of rows):  10018
cat("Number of Attributes (# of columns): " , ncol(restuarant_data), "\n")
## Number of Attributes (# of columns):  34
# The 'glimpse' function provides a transposed view of the data, which is great for viewing types
glimpse(restuarant_data)
## Rows: 10,018
## Columns: 34
## $ id                           <chr> "ChIJ--c8h4jRD4gRRY6i7bZEpZU", "ChIJ--eTT…
## $ name                         <chr> "Lady Gregory's Irish Bar & Restaurant", …
## $ rating                       <dbl> 4.5, 4.5, 4.5, 2.6, 4.0, NA, 4.2, NA, 4.7…
## $ userRatingCount              <dbl> 2822, 572, 1132, 61, 835, NA, 339, NA, 13…
## $ formattedAddress             <chr> "5260 N Clark St, Chicago, IL 60640, USA"…
## $ latitude                     <dbl> 41.97789, 41.79242, 41.81210, 41.87463, 4…
## $ longitude                    <dbl> -87.66856, -87.78884, -87.70782, -87.6686…
## $ phone                        <chr> "(773) 271-5050", "(773) 586-2828", "(773…
## $ businessStatus               <chr> "OPERATIONAL", "OPERATIONAL", "OPERATIONA…
## $ primaryType                  <chr> "restaurant", "pizza_restaurant", "seafoo…
## $ takeout                      <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ delivery                     <lgl> TRUE, TRUE, FALSE, NA, TRUE, NA, TRUE, NA…
## $ dineIn                       <lgl> TRUE, NA, TRUE, TRUE, NA, TRUE, TRUE, TRU…
## $ curbsidePickup               <lgl> NA, NA, FALSE, FALSE, NA, NA, TRUE, NA, F…
## $ reservable                   <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, NA, NA, …
## $ servesLunch                  <lgl> TRUE, NA, TRUE, TRUE, TRUE, NA, TRUE, TRU…
## $ servesDinner                 <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ servesBeer                   <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, NA, FAL…
## $ servesWine                   <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, NA, FAL…
## $ liveMusic                    <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, …
## $ servesCocktails              <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, NA, NA, …
## $ goodForChildren              <lgl> TRUE, NA, TRUE, TRUE, NA, NA, TRUE, NA, N…
## $ acceptsCreditCards           <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ acceptsDebitCards            <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ acceptsCashOnly              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, NA, FA…
## $ acceptsNfc                   <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ freeParkingLot               <lgl> NA, NA, TRUE, NA, TRUE, NA, TRUE, NA, NA,…
## $ freeStreetParking            <lgl> TRUE, TRUE, TRUE, NA, TRUE, NA, TRUE, NA,…
## $ wheelchairAccessibleEntrance <lgl> TRUE, NA, TRUE, TRUE, TRUE, NA, TRUE, TRU…
## $ wheelchairAccessibleRestroom <lgl> TRUE, NA, TRUE, TRUE, NA, NA, TRUE, NA, T…
## $ wheelchairAccessibleSeating  <lgl> TRUE, FALSE, TRUE, TRUE, NA, NA, NA, NA, …
## $ priceStartUSD                <dbl> 20, 10, NA, 10, 10, NA, NA, NA, 10, 20, 1…
## $ priceEndUSD                  <dbl> 30, 20, NA, 20, 20, NA, NA, NA, 20, 30, 1…
## $ googleMapsUri                <chr> "https://maps.google.com/?cid=10783100435…
# Summary provides min, max, median, mean, and quartiles for numeric columns
summary(restuarant_data)
##       id                name               rating      userRatingCount  
##  Length:10018       Length:10018       Min.   :1.000   Min.   :    1.0  
##  Class :character   Class :character   1st Qu.:3.900   1st Qu.:  101.0  
##  Mode  :character   Mode  :character   Median :4.300   Median :  323.0  
##                                        Mean   :4.171   Mean   :  641.3  
##                                        3rd Qu.:4.600   3rd Qu.:  760.0  
##                                        Max.   :5.000   Max.   :23596.0  
##                                        NA's   :540     NA's   :540      
##  formattedAddress      latitude       longitude         phone          
##  Length:10018       Min.   :41.64   Min.   :-87.94   Length:10018      
##  Class :character   1st Qu.:41.81   1st Qu.:-87.77   Class :character  
##  Mode  :character   Median :41.89   Median :-87.70   Mode  :character  
##                     Mean   :41.87   Mean   :-87.72                     
##                     3rd Qu.:41.94   3rd Qu.:-87.65                     
##                     Max.   :42.02   Max.   :-87.52                     
##                                                                        
##  businessStatus     primaryType         takeout         delivery      
##  Length:10018       Length:10018       Mode :logical   Mode :logical  
##  Class :character   Class :character   FALSE:119       FALSE:1252     
##  Mode  :character   Mode  :character   TRUE :9093      TRUE :7257     
##                                        NA's :806       NA's :1509     
##                                                                       
##                                                                       
##                                                                       
##    dineIn        curbsidePickup  reservable      servesLunch    
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:300       FALSE:2220      FALSE:3191      FALSE:193      
##  TRUE :8279      TRUE :2343      TRUE :2959      TRUE :7912     
##  NA's :1439      NA's :5455      NA's :3868      NA's :1913     
##                                                                 
##                                                                 
##                                                                 
##  servesDinner    servesBeer      servesWine      liveMusic      
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:316       FALSE:4071      FALSE:4199      FALSE:6618     
##  TRUE :7561      TRUE :2878      TRUE :2524      TRUE :560      
##  NA's :2141      NA's :3069      NA's :3295      NA's :2840     
##                                                                 
##                                                                 
##                                                                 
##  servesCocktails goodForChildren acceptsCreditCards acceptsDebitCards
##  Mode :logical   Mode :logical   Mode :logical      Mode :logical    
##  FALSE:3833      FALSE:752       FALSE:51           FALSE:87         
##  TRUE :2522      TRUE :6475      TRUE :8347         TRUE :8658       
##  NA's :3663      NA's :2791      NA's :1620         NA's :1273       
##                                                                      
##                                                                      
##                                                                      
##  acceptsCashOnly acceptsNfc      freeParkingLot  freeStreetParking
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical    
##  FALSE:8983      FALSE:212       FALSE:217       FALSE:99         
##  TRUE :116       TRUE :7271      TRUE :4371      TRUE :5004       
##  NA's :919       NA's :2535      NA's :5430      NA's :4915       
##                                                                   
##                                                                   
##                                                                   
##  wheelchairAccessibleEntrance wheelchairAccessibleRestroom
##  Mode :logical                Mode :logical               
##  FALSE:245                    FALSE:193                   
##  TRUE :6816                   TRUE :5141                  
##  NA's :2957                   NA's :4684                  
##                                                           
##                                                           
##                                                           
##  wheelchairAccessibleSeating priceStartUSD     priceEndUSD    
##  Mode :logical               Min.   :  1.00   Min.   : 10.00  
##  FALSE:375                   1st Qu.: 10.00   1st Qu.: 20.00  
##  TRUE :5114                  Median : 10.00   Median : 20.00  
##  NA's :4529                  Mean   : 13.45   Mean   : 23.87  
##                              3rd Qu.: 10.00   3rd Qu.: 20.00  
##                              Max.   :100.00   Max.   :100.00  
##                              NA's   :2550     NA's   :2638    
##  googleMapsUri     
##  Length:10018      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
# View all column names
colnames(restuarant_data)
##  [1] "id"                           "name"                        
##  [3] "rating"                       "userRatingCount"             
##  [5] "formattedAddress"             "latitude"                    
##  [7] "longitude"                    "phone"                       
##  [9] "businessStatus"               "primaryType"                 
## [11] "takeout"                      "delivery"                    
## [13] "dineIn"                       "curbsidePickup"              
## [15] "reservable"                   "servesLunch"                 
## [17] "servesDinner"                 "servesBeer"                  
## [19] "servesWine"                   "liveMusic"                   
## [21] "servesCocktails"              "goodForChildren"             
## [23] "acceptsCreditCards"           "acceptsDebitCards"           
## [25] "acceptsCashOnly"              "acceptsNfc"                  
## [27] "freeParkingLot"               "freeStreetParking"           
## [29] "wheelchairAccessibleEntrance" "wheelchairAccessibleRestroom"
## [31] "wheelchairAccessibleSeating"  "priceStartUSD"               
## [33] "priceEndUSD"                  "googleMapsUri"
# View data in highly formatted output
datatable(restuarant_data)
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

1.3: Attributes to Drop

  • id, googleMapsUri: Unique identifier (non-analytical values).
  • formattedAddress, phone: Not useful for statistical modeling without extensive NLP or geo-processing
  • name: Too high cardinality/unique.
  • lattitude, longitude: High dimensional, rarely used in simple models).
# drop attributes
# tidyverse uses `select` with negative sign (-)
clean_data = restuarant_data %>% 
  select(-id, -googleMapsUri, -formattedAddress, -phone, -name, -latitude, -longitude)

1.4: Clean Rows With Missing values

Will focus on:

  • rating, userRatingCount drop rows missing values
  • priceStartUSD, priceEndUSD drop rows with missing values and ensure values are numeric
  • Clean Boolean columns by replacing missing values or NA values with FALSE
# Handle missing values in critical columns
clean_data = clean_data %>%
  filter(!is.na(rating) & !is.na(userRatingCount))

# Clean boolean columns
bool_cols = c("takeout", "delivery", "dineIn", "curbsidePickup", "reservable", 
                 "servesLunch", "servesDinner", "servesBeer", "servesWine", "liveMusic", 
                 "servesCocktails", "goodForChildren", "acceptsCreditCards", 
                 "acceptsDebitCards", "acceptsCashOnly", "acceptsNfc", "freeParkingLot", 
                 "freeStreetParking", "wheelchairAccessibleEntrance", 
                 "wheelchairAccessibleRestroom", "wheelchairAccessibleSeating")

# Ensure columns are logical (TRUE/FALSE) and impute missing as FALSE
clean_data = clean_data %>%
  mutate(across(all_of(bool_cols),
                ~ifelse(is.na(.) | . == "False", FALSE, TRUE)))

# Clean price columns
clean_data = clean_data %>%
  mutate(priceStartUSD = as.numeric(priceStartUSD),
         priceEndUSD = as.numeric(priceEndUSD)) %>%
  filter(!is.na(priceStartUSD) & !is.na(priceEndUSD))

1.5: Final Check Of Cleaned Data Dimensions and Missing Values

cat("\n--- Final Cleaned Data Dimensions (R) ---\n")
## 
## --- Final Cleaned Data Dimensions (R) ---
cat("Observations (Rows):", nrow(clean_data), "\n")
## Observations (Rows): 7378
cat("Attributes (Columns):", ncol(clean_data), "\n")
## Attributes (Columns): 27
cat("\n--- Missing Values Check After Final Clean (R) ---\n")
## 
## --- Missing Values Check After Final Clean (R) ---
sapply(clean_data, function(x) sum(is.na(x)))
##                       rating              userRatingCount 
##                            0                            0 
##               businessStatus                  primaryType 
##                            0                            0 
##                      takeout                     delivery 
##                            0                            0 
##                       dineIn               curbsidePickup 
##                            0                            0 
##                   reservable                  servesLunch 
##                            0                            0 
##                 servesDinner                   servesBeer 
##                            0                            0 
##                   servesWine                    liveMusic 
##                            0                            0 
##              servesCocktails              goodForChildren 
##                            0                            0 
##           acceptsCreditCards            acceptsDebitCards 
##                            0                            0 
##              acceptsCashOnly                   acceptsNfc 
##                            0                            0 
##               freeParkingLot            freeStreetParking 
##                            0                            0 
## wheelchairAccessibleEntrance wheelchairAccessibleRestroom 
##                            0                            0 
##  wheelchairAccessibleSeating                priceStartUSD 
##                            0                            0 
##                  priceEndUSD 
##                            0
# Save Cleaned Data
output_file = "data/restaurant_cleaned.csv"
write_csv(clean_data, output_file)

Part 2: Exploratory Data Analysis (EDA)

# Display structure and summary of cleaned data
summary(clean_data)
##      rating      userRatingCount   businessStatus     primaryType       
##  Min.   :1.200   Min.   :    2.0   Length:7378        Length:7378       
##  1st Qu.:3.900   1st Qu.:  159.2   Class :character   Class :character  
##  Median :4.300   Median :  384.0   Mode  :character   Mode  :character  
##  Mean   :4.195   Mean   :  681.8                                        
##  3rd Qu.:4.500   3rd Qu.:  808.0                                        
##  Max.   :5.000   Max.   :23596.0                                        
##   takeout         delivery         dineIn        curbsidePickup 
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:141       FALSE:541       FALSE:500       FALSE:3709     
##  TRUE :7237      TRUE :6837      TRUE :6878      TRUE :3669     
##                                                                 
##                                                                 
##                                                                 
##  reservable      servesLunch     servesDinner    servesBeer     
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:2099      FALSE:643       FALSE:838       FALSE:1427     
##  TRUE :5279      TRUE :6735      TRUE :6540      TRUE :5951     
##                                                                 
##                                                                 
##                                                                 
##  servesWine      liveMusic       servesCocktails goodForChildren
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:1592      FALSE:1421      FALSE:1941      FALSE:1309     
##  TRUE :5786      TRUE :5957      TRUE :5437      TRUE :6069     
##                                                                 
##                                                                 
##                                                                 
##  acceptsCreditCards acceptsDebitCards acceptsCashOnly acceptsNfc     
##  Mode :logical      Mode :logical     Mode :logical   Mode :logical  
##  FALSE:703          FALSE:453         FALSE:235       FALSE:1254     
##  TRUE :6675         TRUE :6925        TRUE :7143      TRUE :6124     
##                                                                      
##                                                                      
##                                                                      
##  freeParkingLot  freeStreetParking wheelchairAccessibleEntrance
##  Mode :logical   Mode :logical     Mode :logical               
##  FALSE:3558      FALSE:3138        FALSE:1749                  
##  TRUE :3820      TRUE :4240        TRUE :5629                  
##                                                                
##                                                                
##                                                                
##  wheelchairAccessibleRestroom wheelchairAccessibleSeating priceStartUSD  
##  Mode :logical                Mode :logical               Min.   : 1.00  
##  FALSE:3040                   FALSE:2697                  1st Qu.:10.00  
##  TRUE :4338                   TRUE :4681                  Median :10.00  
##                                                           Mean   :12.42  
##                                                           3rd Qu.:10.00  
##                                                           Max.   :50.00  
##   priceEndUSD    
##  Min.   : 10.00  
##  1st Qu.: 20.00  
##  Median : 20.00  
##  Mean   : 23.87  
##  3rd Qu.: 20.00  
##  Max.   :100.00

2.1: Bar chart for Top 15 Most Common Restaurant Categories

top_categories = clean_data %>%
  count(primaryType, sort = TRUE) %>%
  top_n(15)
## Selecting by n
# Create the bar chart
ggplot(top_categories, aes(x = reorder(primaryType, n), y = n)) +
  geom_col(fill = "lightblue") +
  coord_flip() +  # Flips the axes to make labels easier to read
  labs(
    title = "Top 15 Most Common Restaurant Categories",
    x = "Restaurant Category",
    y = "Number of Restaurants"
  ) +
  theme_minimal() # Number of restaurants in the top 15 categories

2.2: Bar chart for Business Status

###Bar chart for Business Status
ggplot(clean_data, aes(x = reorder(businessStatus, -table(businessStatus)[businessStatus]))) +
  geom_bar(fill = "skyblue", color = "black") +
  geom_text(stat = 'count', aes(label = after_stat(count)), vjust = -0.3) +
  labs(
    title = "Distribution of Restaurant Business Status",
    x = "Business Status",
    y = "Number of Restaurants"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

2.3: Overlay Histogram for Rating by Reservability

# Create Overlay Histogram for Rating by Reservability
ggplot(clean_data, aes(x = rating, fill = factor(reservable))) +
  geom_histogram(aes(y = after_stat(count / sum(count)) * 100),
                 position = "identity", alpha = 0.6, bins = 20, color = "black") +
  scale_fill_manual(values = c("steelblue", "salmon", "lightgreen"), name = "Reservable") +
  labs(
    title = "Percentage Distribution of Ratings by Reservability",
    x = "Rating",
    y = "Percentage of Restaurants"
  ) +
  theme_minimal()

# Using geom_density is often a better choice than an overlayed histogram for comparing distributions because it
# provides a smoother representation and avoids issues with binning and overlapping bars, which can be hard to interpret.
mean_ratings = clean_data %>%
  group_by(reservable) %>%
  summarise(mean_rating = mean(rating))

# Create an overlayed density plot for Rating by Reservability
ggplot(clean_data, aes(x = rating, fill = reservable)) +
  geom_density(alpha = 0.6, color = "black") +
  geom_vline(data = mean_ratings, aes(xintercept = mean_rating, color = reservable),
             linetype = "dashed", linewidth = 1, show.legend = FALSE) +
  scale_fill_manual(
    name = "Reservable",
    values = c("FALSE" = "salmon", "TRUE" = "steelblue"),
    labels = c("No", "Yes")
  ) +
  scale_color_manual(
    values = c("FALSE" = "darkred", "TRUE" = "darkblue")
  ) +
  labs(
    title = "Distribution of Ratings by Reservability",
    x = "Rating",
    y = "Density"
  ) +
  theme_minimal()

2.4: Histogram Distribution of Ratings

Histogram helps to visualize the central tendancy and spread of restaurant ratings

ggplot(clean_data, aes(x=rating)) +
  geom_histogram(binwidth = 0.2, fill='skyblue', color='black', alpha=0.8) +
  # Add density line (equivalent to KDE)
  geom_density(color = "blue", linewidth = 1) +
  # Add vertical line for the mean
  geom_vline(aes(xintercept = mean(rating)),
             color = "red", linetype = "dashed", linewidth = 1,
             show.legend = TRUE) +
  labs(
    title = 'Distribution of Restuarant Rating In Chicago',
    x = 'Rating',
    y = 'Frequency'
  )

# Histogram of ratings in Chicago
#hist(clean_data$rating, main="Rating of Restaurants in Chicago", xlab="Ratings", col="lightblue")

# Histogram of rating counts in Chicago
hist(clean_data$userRatingCount, main="Rating Count of Restaurants in Chicago", xlab="Rating Count", breaks=100, col="lightblue") 

The histogram reveals that majority of restaurants have rating clustered between 4.0 and 4.6, suggesting a high concentration of well-regarded bussinesses. The mean rating is approximately 4.195 comfirms the positive skew.

2.5 Boxplot: Rating Distribution by Business Type

Boxplot will help visualize how quantitative variable (rating) is distributed across different categories (primaryType)

# identity top 10 types
top_types = clean_data %>%
  count(primaryType, sort = TRUE) %>%
  slice_head(n=10) %>%
  pull(primaryType)

data_top_types = clean_data %>%
  # convert primaryType to a factor for proper ordering in the plot
  mutate(primaryType=factor(primaryType, levels = rev(top_types)))

# generate boxplot
ggplot(data_top_types, aes(x=rating, y=primaryType, fill=primaryType)) +
  geom_boxplot() +
  labs(
    title = 'Rating Distribution Across Top 10 Restaurant Types',
    x = 'Rating',
    y = 'Primary Type'
  )

# Ratings boxplot
boxplot(x=clean_data$rating, main="Ratings of Restaurants in Chicago", col="lightblue") 


Part 3: Hypothesis Constructions

3.1: ANOVA Test for Restaurant primaryType vs. Rating

We are testing whether the population mean of rating are equal across the different categories of restaurant primaryType (or among the top 5 types).

  • Null hypothesis (\(H_0\)): The true mean of restaurant rating is the same for all primary restaurant types.

\[ H_0: {\mu}_{type1} = {\mu}_{type2} = {\mu}_{type2} = ... = {\mu}_{type5} \]

  • Alternative hypothesis (\(H_1\)): At least one primary restaurant type has a true mean rating that is significantly different from the others.

\[ H_1: {\mu}_{type1} \ne {\mu}_{type2} \ne ... \ne {\mu}_{type5} \]

  • Test Method: One way Analysis of Variance (ANOVA). ANOVA test to check the association between one numerical variable and one categorical variable.

  • Significance level (\(\alpha\)): 0.05

# filter for top 5 primaryTypes
top_5_primeTypes = clean_data %>%
  count(primaryType, sort = TRUE) %>%
  slice_head(n=5) %>%
  pull(primaryType)

top_5_anova_data = clean_data %>%
  filter(primaryType %in% top_5_primeTypes) %>%
  mutate(primaryType = factor(primaryType)) # Ensure the type is a factor

# ANOVA for top 5 primaryTypes
anova_5_results = aov(rating ~ primaryType, data=top_5_anova_data)

# view summary of top 5 primaryTypes
summary(anova_5_results)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## primaryType    4   82.5  20.620   89.21 <2e-16 ***
## Residuals   4200  970.8   0.231                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA for all primaryTypes
anova_results = aov(rating ~ primaryType, data=clean_data)

# View summary
summary(anova_results)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## primaryType   55    294   5.345   25.43 <2e-16 ***
## Residuals   7322   1539   0.210                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

  • P-value Analysis: The calculated p-value is extremely small, for less than the significance level of \(\alpha=0.05\)
  • Decision: Since the p-value is less than \(\alpha\), we reject the null hypotheis.
  • We have statistical evidence to conclude that there is a difference in the mean customer rating among the top five primary restuarant types (including all types). In other words, the type of restaurant is a significant factor in predicting customer rating.

3.2: Hypothesis Test: Delivery vs. Rating

We are testing whether restaurants that offer delivery services have different average ratings compared to those that do not

  • Null Hypothesis: Restaurants with deliveries have the same average rating as those who don’t.

  • Alternative Hypothesis: Restaurants with deliveries have a higher average rating compared to those who don’t

  • Test Method: One-tailed t-test comparing means of two independent samples (delivery = TRUE vs. delivery = FALSE)

  • Significance level (\(\alpha\)): 0.05

# Check delivery value counts
table(clean_data$delivery)
## 
## FALSE  TRUE 
##   541  6837
delivery_true = clean_data$rating[clean_data$delivery == TRUE]
delivery_false = clean_data$rating[clean_data$delivery == FALSE]

# One-tailed t-test: H1 = delivery has higher rating
t_result = t.test(delivery_true, delivery_false, alternative = "greater", var.equal = FALSE)
t_result
## 
##  Welch Two Sample t-test
## 
## data:  delivery_true and delivery_false
## t = 0.34527, df = 588.19, p-value = 0.365
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.03726101         Inf
## sample estimates:
## mean of x mean of y 
##  4.195832  4.185952

Conclusion

  • P-value Analysis: The calculated p-value is 0.365, which is greater than the significance level of \(\alpha=0.05\)
  • Decision: Since the p-value is greater than \(\alpha\), we fail to reject the null hypotheis.
  • There is insufficient statistical evidence to conclude that restaurants offering delivery services have higher average ratings compared to those that do not. In fact, restaurants that offer deliveries have a lower average rating of about 0.15

Part 4: Linear Regression Model with Subset Selection

library(MLmetrics)
## 
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
## 
##     Recall
library(MASS) # Run stepwise regression, MASS package required
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(caTools) # for stratified splitting

4.1: Split data set into 80:20 train and test data with name RestaurantTraining and RestaurantTest respectively

# Setting the seed fixes the randomness in the split for reproducibility
set.seed(42)

# Ensure primaryType is a factor
data_to_split = clean_data %>%
  mutate(primaryType = as.factor(primaryType))

# perform stratified split on the primaryType column to maintain its distribution
split = sample.split(data_to_split$primaryType, SplitRatio = 0.8)
RestaurantTraining = subset(data_to_split, split == TRUE)
RestaurantTest = subset(data_to_split, split == FALSE)

# # Check new dimensions
# cat("Training set size:", nrow(RestaurantTraining), "\n")
# cat("Testing set size:", nrow(RestaurantTest), "\n")

4.2: Multiple Regression Model

# construct multiple linear regression model
mr_model = lm(rating ~ . , data = RestaurantTraining )

# Display summary of the model
# Residuals (The Error Distribution)
# - Median,"Should be close to zero. If the median is far from zero, it suggests the model is systematically biased (e.g., over- or under-predicting)."
# - Min & Max,Indicate the size of the largest errors. Extremely large positive or negative values suggest outliers or heteroscedasticity.
# - 1Q & 3Q,Show the spread of the middle 50% of the errors. These should be relatively symmetric around zero.
summary(mr_model)
## 
## Call:
## lm(formula = rating ~ ., data = RestaurantTraining)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.06849 -0.19696  0.05923  0.27832  1.17792 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           4.506e+00  2.660e-01  16.940  < 2e-16 ***
## userRatingCount                       3.113e-05  5.866e-06   5.306 1.16e-07 ***
## businessStatusCLOSED_TEMPORARILY      2.614e-01  1.345e-01   1.944 0.051957 .  
## businessStatusOPERATIONAL             1.866e-01  1.302e-01   1.434 0.151653    
## primaryTypeafghani_restaurant        -3.271e-01  5.022e-01  -0.651 0.514878    
## primaryTypeafrican_restaurant        -2.960e-01  2.760e-01  -1.073 0.283516    
## primaryTypeamerican_restaurant       -4.058e-01  2.266e-01  -1.791 0.073326 .  
## primaryTypeasian_restaurant          -3.565e-01  2.348e-01  -1.519 0.128936    
## primaryTypebagel_shop                -4.768e-01  2.531e-01  -1.884 0.059679 .  
## primaryTypebakery                    -1.970e-01  2.390e-01  -0.824 0.409995    
## primaryTypebar                       -3.191e-01  2.281e-01  -1.399 0.161899    
## primaryTypebar_and_grill             -2.935e-01  2.294e-01  -1.279 0.200884    
## primaryTypebarbecue_restaurant       -6.084e-01  2.335e-01  -2.605 0.009203 ** 
## primaryTypebrazilian_restaurant      -2.812e-01  3.195e-01  -0.880 0.378736    
## primaryTypebreakfast_restaurant      -3.291e-01  2.284e-01  -1.440 0.149800    
## primaryTypebrunch_restaurant         -3.475e-01  2.529e-01  -1.374 0.169469    
## primaryTypebuffet_restaurant         -6.461e-01  2.696e-01  -2.396 0.016600 *  
## primaryTypecafe                      -4.589e-01  2.310e-01  -1.987 0.046993 *  
## primaryTypecafeteria                  2.670e-01  5.013e-01   0.533 0.594296    
## primaryTypechinese_restaurant        -6.010e-01  2.272e-01  -2.645 0.008189 ** 
## primaryTypecoffee_shop               -6.989e-01  2.269e-01  -3.080 0.002080 ** 
## primaryTypedeli                       2.915e-02  2.547e-01   0.114 0.908862    
## primaryTypediner                     -3.329e-01  2.414e-01  -1.379 0.167866    
## primaryTypedonut_shop                -2.581e-01  2.576e-01  -1.002 0.316447    
## primaryTypefast_food_restaurant      -7.627e-01  2.282e-01  -3.342 0.000836 ***
## primaryTypefine_dining_restaurant    -4.743e-01  3.898e-01  -1.217 0.223665    
## primaryTypefood_court                -3.954e-02  3.887e-01  -0.102 0.918982    
## primaryTypefood_store                -1.098e-01  3.015e-01  -0.364 0.715765    
## primaryTypefrench_restaurant         -2.362e-01  2.673e-01  -0.884 0.376796    
## primaryTypegreek_restaurant          -3.097e-01  2.446e-01  -1.266 0.205592    
## primaryTypehamburger_restaurant      -5.093e-01  2.325e-01  -2.190 0.028545 *  
## primaryTypeindian_restaurant         -4.151e-01  2.330e-01  -1.781 0.074894 .  
## primaryTypeitalian_restaurant        -2.866e-01  2.288e-01  -1.252 0.210439    
## primaryTypejapanese_restaurant       -2.903e-01  2.338e-01  -1.241 0.214535    
## primaryTypejuice_shop                -3.313e-01  2.396e-01  -1.382 0.166892    
## primaryTypekorean_restaurant         -2.199e-01  2.377e-01  -0.925 0.355011    
## primaryTypelebanese_restaurant        1.318e-02  3.896e-01   0.034 0.973017    
## primaryTypemeal_delivery             -9.386e-01  2.338e-01  -4.015 6.03e-05 ***
## primaryTypemeal_takeaway             -2.581e-01  2.405e-01  -1.073 0.283214    
## primaryTypemediterranean_restaurant  -2.436e-01  2.334e-01  -1.043 0.296828    
## primaryTypemexican_restaurant        -4.086e-01  2.260e-01  -1.808 0.070646 .  
## primaryTypemiddle_eastern_restaurant -1.887e-01  2.367e-01  -0.797 0.425415    
## primaryTypenight_club                -4.618e-01  5.005e-01  -0.923 0.356171    
## primaryTypepizza_restaurant          -5.064e-01  2.266e-01  -2.235 0.025442 *  
## primaryTypepub                       -2.438e-01  2.353e-01  -1.036 0.300048    
## primaryTyperamen_restaurant          -2.172e-01  2.418e-01  -0.898 0.369146    
## primaryTyperestaurant                -4.160e-01  2.255e-01  -1.845 0.065126 .  
## primaryTypesandwich_shop             -6.946e-01  2.267e-01  -3.064 0.002193 ** 
## primaryTypeseafood_restaurant        -4.524e-01  2.312e-01  -1.957 0.050393 .  
## primaryTypespanish_restaurant        -3.263e-01  2.912e-01  -1.121 0.262527    
## primaryTypesteak_house               -3.588e-01  2.453e-01  -1.463 0.143579    
## primaryTypesushi_restaurant          -1.352e-01  2.325e-01  -0.581 0.560989    
## primaryTypetea_house                 -1.239e-01  3.174e-01  -0.391 0.696168    
## primaryTypethai_restaurant           -2.609e-01  2.304e-01  -1.133 0.257418    
## primaryTypeturkish_restaurant        -5.449e-02  2.703e-01  -0.202 0.840264    
## primaryTypevegan_restaurant          -1.328e-01  2.455e-01  -0.541 0.588536    
## primaryTypevegetarian_restaurant     -5.087e-01  3.428e-01  -1.484 0.137967    
## primaryTypevietnamese_restaurant     -1.767e-01  2.378e-01  -0.743 0.457588    
## primaryTypewine_bar                  -2.711e-01  3.009e-01  -0.901 0.367705    
## takeoutTRUE                          -3.823e-02  4.660e-02  -0.821 0.411940    
## deliveryTRUE                          8.497e-02  2.398e-02   3.544 0.000398 ***
## dineInTRUE                            3.477e-02  2.572e-02   1.352 0.176469    
## curbsidePickupTRUE                    6.097e-02  1.224e-02   4.980 6.56e-07 ***
## reservableTRUE                       -2.400e-03  1.787e-02  -0.134 0.893127    
## servesLunchTRUE                      -2.746e-02  2.956e-02  -0.929 0.352911    
## servesDinnerTRUE                     -1.184e-01  2.657e-02  -4.457 8.46e-06 ***
## servesBeerTRUE                        4.065e-02  3.238e-02   1.255 0.209461    
## servesWineTRUE                        6.476e-02  2.939e-02   2.204 0.027572 *  
## liveMusicTRUE                        -1.809e-02  2.233e-02  -0.810 0.418006    
## servesCocktailsTRUE                  -8.048e-02  2.656e-02  -3.031 0.002451 ** 
## goodForChildrenTRUE                   1.389e-01  1.842e-02   7.540 5.43e-14 ***
## acceptsCreditCardsTRUE                9.165e-02  2.831e-02   3.237 0.001215 ** 
## acceptsDebitCardsTRUE                -7.960e-02  3.883e-02  -2.050 0.040385 *  
## acceptsCashOnlyTRUE                  -1.496e-01  5.334e-02  -2.804 0.005062 ** 
## acceptsNfcTRUE                       -5.688e-02  1.935e-02  -2.940 0.003299 ** 
## freeParkingLotTRUE                   -7.612e-02  1.381e-02  -5.512 3.69e-08 ***
## freeStreetParkingTRUE                 1.059e-01  1.357e-02   7.807 6.92e-15 ***
## wheelchairAccessibleEntranceTRUE     -9.515e-02  1.760e-02  -5.407 6.67e-08 ***
## wheelchairAccessibleRestroomTRUE     -2.438e-02  1.622e-02  -1.503 0.132918    
## wheelchairAccessibleSeatingTRUE       3.507e-02  1.716e-02   2.044 0.041025 *  
## priceStartUSD                         3.368e-03  3.006e-03   1.120 0.262551    
## priceEndUSD                           1.294e-03  1.684e-03   0.768 0.442288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.447 on 5819 degrees of freedom
## Multiple R-squared:  0.2088, Adjusted R-squared:  0.1978 
## F-statistic: 18.96 on 81 and 5819 DF,  p-value: < 2.2e-16

Conclusion

  • The following predictors appear to have a statistically significant relationship to the response variable (p-value < 0.05): userRatingCount, deliveryTRUE, curbsidePickupTRUE, servesDinnerTRUE, servesWineTRUE, servesCocktailsTRUE, goodForChildrenTRUE, acceptsCreditCardsTRUE, acceptsDebitCardsTRUE, acceptsCashOnlyTRUE, acceptsNfcTRUE, freeParkingLotTRUE, freeStreetParkingTRUE, wheelchairAccessibleEntranceTRUE, wheelchairAccessibleSeatingTRUE, and several levels of the categorical variable primaryTypebarbecue_restaurant, primaryTypebuffet_restaurant primaryTypecafe primaryTypechinese_restaurant primaryTypecoffee_shop primaryTypefast_food_restaurant primaryTypehamburger_restaurant primaryTypemeal_delivery primaryTypepizza_restaurant primaryTypesandwich_shop.

  • Residual Standard Error (RSE) is 0.447 on 5819 degrees of freedom.

  • Multiple R-squared (\(R^2\)) is 0.2088 The model explains 20.88% of the variance in rating.

  • Adjusted R-squared (\(R^2\)) is 0.1978. This is a better measure for comparison, as it penalizes models for including irrelevant variables. If the Adjusted \(R^2\) is much lower than \(R^2\), it confirms that many of your predictors (likely the numerous primaryType dummy variables) are not useful and the model is slightly overfit.

  • Use this model to predict rating in RestaurantTest and calculate MAE and MSE.
# Predicting rating
ypred = predict(object=mr_model, newdata = RestaurantTest)

# Mean Absolute Error of predicted rating, and Actual rating
# MAE measures the average magnitude of errors in a set of predictions, without considering their direction (it ignores positive or negative signs). It is the simplest and most intuitive metric because it is in the same units as the target variable
# On average, your model's predicted rating is off by $0.318$ rating points from the actual customer rating.
MAE(y_pred = ypred , y_true = RestaurantTest$rating)
## [1] 0.3188165
# # Mean Square Error of predicted rating, and Actual rating
# MSE measures the average squared difference between the predicted and actual values. By squaring the errors, $\text{MSE}$ penalizes large errors much more heavily than $\text{MAE}$
MSE(y_pred = ypred, y_true = RestaurantTest$rating)
## [1] 0.1849335

4.3: Subset Selection Linear Regression Model

# create a null model / intercept only mode
null_model = lm(rating ~ 1, data = RestaurantTraining)

# create a full model
full_model = lm(rating~. , data = RestaurantTraining)

# perform step-wise selection using stepAIC()
step_forward = stepAIC(null_model, direction='forward', scope=formula(full_model))
## Start:  AIC=-8201.43
## rating ~ 1
## 
##                                Df Sum of Sq    RSS     AIC
## + primaryType                  55   223.312 1246.2 -9064.1
## + priceStartUSD                 1    77.820 1391.7 -8520.5
## + priceEndUSD                   1    60.760 1408.8 -8448.6
## + freeParkingLot                1    28.461 1441.1 -8314.8
## + acceptsNfc                    1    14.890 1454.6 -8259.5
## + servesLunch                   1    13.929 1455.6 -8255.6
## + servesDinner                  1    13.087 1456.5 -8252.2
## + goodForChildren               1    12.015 1457.5 -8247.9
## + acceptsDebitCards             1    11.077 1458.5 -8244.1
## + wheelchairAccessibleEntrance  1    11.067 1458.5 -8244.0
## + acceptsCashOnly               1     8.889 1460.6 -8235.2
## + wheelchairAccessibleSeating   1     7.905 1461.6 -8231.3
## + servesCocktails               1     7.168 1462.4 -8228.3
## + liveMusic                     1     6.969 1462.6 -8227.5
## + reservable                    1     5.223 1464.3 -8220.4
## + userRatingCount               1     4.934 1464.6 -8219.3
## + dineIn                        1     2.282 1467.2 -8208.6
## + acceptsCreditCards            1     2.241 1467.3 -8208.4
## + takeout                       1     1.926 1467.6 -8207.2
## + servesWine                    1     1.770 1467.8 -8206.5
## + curbsidePickup                1     1.654 1467.9 -8206.1
## + servesBeer                    1     1.344 1468.2 -8204.8
## + freeStreetParking             1     1.006 1468.5 -8203.5
## + wheelchairAccessibleRestroom  1     0.771 1468.8 -8202.5
## <none>                                      1469.5 -8201.4
## + delivery                      1     0.439 1469.1 -8201.2
## + businessStatus                2     0.934 1468.6 -8201.2
## 
## Step:  AIC=-9064.09
## rating ~ primaryType
## 
##                                Df Sum of Sq    RSS     AIC
## + priceStartUSD                 1   13.1952 1233.0 -9124.9
## + priceEndUSD                   1   12.2480 1234.0 -9120.4
## + goodForChildren               1    7.3870 1238.8 -9097.2
## + curbsidePickup                1    6.4304 1239.8 -9092.6
## + wheelchairAccessibleEntrance  1    5.9788 1240.2 -9090.5
## + servesDinner                  1    5.9537 1240.3 -9090.3
## + acceptsCashOnly               1    5.6555 1240.6 -9088.9
## + acceptsDebitCards             1    5.5139 1240.7 -9088.3
## + acceptsNfc                    1    5.1755 1241.0 -9086.6
## + userRatingCount               1    4.7609 1241.5 -9084.7
## + freeStreetParking             1    4.2582 1242.0 -9082.3
## + freeParkingLot                1    4.2314 1242.0 -9082.2
## + servesLunch                   1    4.2030 1242.0 -9082.0
## + delivery                      1    1.8330 1244.4 -9070.8
## + liveMusic                     1    1.7038 1244.5 -9070.2
## + servesWine                    1    1.2065 1245.0 -9067.8
## + dineIn                        1    0.7416 1245.5 -9065.6
## + takeout                       1    0.6993 1245.5 -9065.4
## + servesBeer                    1    0.5761 1245.7 -9064.8
## <none>                                      1246.2 -9064.1
## + businessStatus                2    0.7759 1245.5 -9063.8
## + servesCocktails               1    0.2401 1246.0 -9063.2
## + wheelchairAccessibleSeating   1    0.2313 1246.0 -9063.2
## + acceptsCreditCards            1    0.1199 1246.1 -9062.7
## + reservable                    1    0.0296 1246.2 -9062.2
## + wheelchairAccessibleRestroom  1    0.0281 1246.2 -9062.2
## 
## Step:  AIC=-9124.9
## rating ~ primaryType + priceStartUSD
## 
##                                Df Sum of Sq    RSS     AIC
## + wheelchairAccessibleEntrance  1    7.8771 1225.2 -9160.7
## + goodForChildren               1    7.1272 1225.9 -9157.1
## + servesDinner                  1    7.1013 1225.9 -9157.0
## + acceptsDebitCards             1    6.9493 1226.1 -9156.3
## + acceptsCashOnly               1    6.3733 1226.7 -9153.5
## + acceptsNfc                    1    6.1603 1226.9 -9152.5
## + curbsidePickup                1    5.6498 1227.4 -9150.0
## + freeStreetParking             1    5.4927 1227.5 -9149.2
## + servesLunch                   1    3.1223 1229.9 -9137.9
## + userRatingCount               1    2.9930 1230.0 -9137.2
## + freeParkingLot                1    2.4348 1230.6 -9134.6
## + delivery                      1    1.7627 1231.3 -9131.3
## + liveMusic                     1    1.2735 1231.8 -9129.0
## + servesCocktails               1    0.7240 1232.3 -9126.4
## + wheelchairAccessibleRestroom  1    0.7018 1232.3 -9126.3
## + businessStatus                2    0.8620 1232.2 -9125.0
## + servesWine                    1    0.4280 1232.6 -9125.0
## <none>                                      1233.0 -9124.9
## + reservable                    1    0.3747 1232.7 -9124.7
## + takeout                       1    0.3236 1232.7 -9124.5
## + dineIn                        1    0.2651 1232.8 -9124.2
## + servesBeer                    1    0.1232 1232.9 -9123.5
## + acceptsCreditCards            1    0.0065 1233.0 -9122.9
## + wheelchairAccessibleSeating   1    0.0038 1233.0 -9122.9
## + priceEndUSD                   1    0.0002 1233.0 -9122.9
## 
## Step:  AIC=-9160.72
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance
## 
##                                Df Sum of Sq    RSS     AIC
## + goodForChildren               1   11.9171 1213.2 -9216.4
## + freeStreetParking             1    7.8357 1217.3 -9196.6
## + curbsidePickup                1    5.5520 1219.6 -9185.5
## + userRatingCount               1    5.2887 1219.9 -9184.3
## + servesDinner                  1    4.3291 1220.8 -9179.6
## + acceptsDebitCards             1    3.5097 1221.6 -9175.7
## + acceptsCashOnly               1    3.2901 1221.9 -9174.6
## + delivery                      1    2.9768 1222.2 -9173.1
## + acceptsNfc                    1    2.8353 1222.3 -9172.4
## + servesWine                    1    1.9664 1223.2 -9168.2
## + wheelchairAccessibleSeating   1    1.8893 1223.3 -9167.8
## + servesLunch                   1    1.4940 1223.7 -9165.9
## + servesBeer                    1    1.3280 1223.8 -9165.1
## + dineIn                        1    1.1358 1224.0 -9164.2
## + freeParkingLot                1    1.1306 1224.0 -9164.2
## <none>                                      1225.2 -9160.7
## + wheelchairAccessibleRestroom  1    0.3172 1224.8 -9160.2
## + acceptsCreditCards            1    0.3139 1224.8 -9160.2
## + liveMusic                     1    0.2999 1224.8 -9160.2
## + businessStatus                2    0.6180 1224.5 -9159.7
## + takeout                       1    0.1055 1225.0 -9159.2
## + reservable                    1    0.0394 1225.1 -9158.9
## + servesCocktails               1    0.0182 1225.1 -9158.8
## + priceEndUSD                   1    0.0165 1225.1 -9158.8
## 
## Step:  AIC=-9216.4
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren
## 
##                                Df Sum of Sq    RSS     AIC
## + servesDinner                  1    8.5151 1204.7 -9256.0
## + acceptsCashOnly               1    7.4746 1205.8 -9250.9
## + freeStreetParking             1    6.5156 1206.7 -9246.2
## + acceptsDebitCards             1    6.2485 1207.0 -9244.9
## + curbsidePickup                1    5.9500 1207.3 -9243.4
## + userRatingCount               1    4.4390 1208.8 -9236.0
## + acceptsNfc                    1    4.2570 1209.0 -9235.1
## + servesLunch                   1    3.9060 1209.3 -9233.4
## + delivery                      1    2.2192 1211.0 -9225.2
## + liveMusic                     1    2.1014 1211.1 -9224.6
## + freeParkingLot                1    1.3494 1211.9 -9221.0
## + wheelchairAccessibleSeating   1    0.6120 1212.6 -9217.4
## + servesWine                    1    0.5057 1212.7 -9216.9
## <none>                                      1213.2 -9216.4
## + takeout                       1    0.3972 1212.8 -9216.3
## + servesCocktails               1    0.3788 1212.8 -9216.2
## + businessStatus                2    0.7321 1212.5 -9216.0
## + dineIn                        1    0.2443 1213.0 -9215.6
## + acceptsCreditCards            1    0.1928 1213.0 -9215.3
## + servesBeer                    1    0.1782 1213.0 -9215.3
## + reservable                    1    0.1266 1213.1 -9215.0
## + wheelchairAccessibleRestroom  1    0.0138 1213.2 -9214.5
## + priceEndUSD                   1    0.0074 1213.2 -9214.4
## 
## Step:  AIC=-9255.96
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner
## 
##                                Df Sum of Sq    RSS     AIC
## + freeStreetParking             1    7.7245 1197.0 -9291.9
## + curbsidePickup                1    6.3059 1198.4 -9284.9
## + userRatingCount               1    5.4326 1199.3 -9280.6
## + acceptsCashOnly               1    3.3826 1201.3 -9270.6
## + delivery                      1    3.1954 1201.5 -9269.6
## + acceptsDebitCards             1    3.1035 1201.6 -9269.2
## + acceptsNfc                    1    2.7077 1202.0 -9267.2
## + wheelchairAccessibleSeating   1    1.4288 1203.3 -9261.0
## + servesWine                    1    1.3903 1203.3 -9260.8
## + freeParkingLot                1    0.8805 1203.8 -9258.3
## + servesBeer                    1    0.8702 1203.8 -9258.2
## + liveMusic                     1    0.6821 1204.0 -9257.3
## + servesLunch                   1    0.6305 1204.1 -9257.1
## <none>                                      1204.7 -9256.0
## + dineIn                        1    0.3535 1204.4 -9255.7
## + businessStatus                2    0.5703 1204.2 -9254.8
## + reservable                    1    0.0518 1204.7 -9254.2
## + acceptsCreditCards            1    0.0462 1204.7 -9254.2
## + takeout                       1    0.0280 1204.7 -9254.1
## + priceEndUSD                   1    0.0173 1204.7 -9254.0
## + wheelchairAccessibleRestroom  1    0.0126 1204.7 -9254.0
## + servesCocktails               1    0.0107 1204.7 -9254.0
## 
## Step:  AIC=-9291.92
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking
## 
##                                Df Sum of Sq    RSS     AIC
## + curbsidePickup                1    5.8111 1191.2 -9318.6
## + userRatingCount               1    5.1661 1191.8 -9315.4
## + freeParkingLot                1    5.1259 1191.9 -9315.2
## + acceptsCashOnly               1    4.0710 1192.9 -9310.0
## + acceptsDebitCards             1    3.6984 1193.3 -9308.2
## + acceptsNfc                    1    3.2275 1193.8 -9305.9
## + delivery                      1    2.7626 1194.2 -9303.6
## + servesWine                    1    0.9353 1196.1 -9294.5
## + wheelchairAccessibleSeating   1    0.8050 1196.2 -9293.9
## + liveMusic                     1    0.7626 1196.2 -9293.7
## + servesLunch                   1    0.7517 1196.2 -9293.6
## + servesBeer                    1    0.4837 1196.5 -9292.3
## <none>                                      1197.0 -9291.9
## + dineIn                        1    0.3100 1196.7 -9291.5
## + businessStatus                2    0.6851 1196.3 -9291.3
## + servesCocktails               1    0.1481 1196.8 -9290.7
## + takeout                       1    0.0840 1196.9 -9290.3
## + priceEndUSD                   1    0.0101 1197.0 -9290.0
## + wheelchairAccessibleRestroom  1    0.0080 1197.0 -9290.0
## + acceptsCreditCards            1    0.0046 1197.0 -9289.9
## + reservable                    1    0.0017 1197.0 -9289.9
## 
## Step:  AIC=-9318.64
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup
## 
##                                Df Sum of Sq    RSS     AIC
## + freeParkingLot                1    5.8030 1185.4 -9345.5
## + userRatingCount               1    4.6745 1186.5 -9339.8
## + acceptsCashOnly               1    3.9956 1187.2 -9336.5
## + acceptsDebitCards             1    3.8173 1187.4 -9335.6
## + acceptsNfc                    1    3.1479 1188.0 -9332.3
## + delivery                      1    1.7966 1189.4 -9325.5
## + servesWine                    1    1.2005 1190.0 -9322.6
## + wheelchairAccessibleSeating   1    0.7459 1190.4 -9320.3
## + servesBeer                    1    0.7343 1190.5 -9320.3
## + servesLunch                   1    0.6636 1190.5 -9319.9
## + liveMusic                     1    0.5247 1190.7 -9319.2
## <none>                                      1191.2 -9318.6
## + businessStatus                2    0.6784 1190.5 -9318.0
## + dineIn                        1    0.1829 1191.0 -9317.5
## + takeout                       1    0.1540 1191.0 -9317.4
## + servesCocktails               1    0.0947 1191.1 -9317.1
## + acceptsCreditCards            1    0.0171 1191.2 -9316.7
## + priceEndUSD                   1    0.0112 1191.2 -9316.7
## + wheelchairAccessibleRestroom  1    0.0069 1191.2 -9316.7
## + reservable                    1    0.0055 1191.2 -9316.7
## 
## Step:  AIC=-9345.46
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot
## 
##                                Df Sum of Sq    RSS     AIC
## + userRatingCount               1    4.8170 1180.6 -9367.5
## + acceptsCashOnly               1    4.2835 1181.1 -9364.8
## + acceptsDebitCards             1    3.9070 1181.5 -9362.9
## + acceptsNfc                    1    3.0568 1182.3 -9358.7
## + delivery                      1    1.8491 1183.5 -9352.7
## + servesWine                    1    1.2140 1184.2 -9349.5
## + wheelchairAccessibleSeating   1    0.8423 1184.5 -9347.7
## + servesBeer                    1    0.7574 1184.6 -9347.2
## + servesLunch                   1    0.6590 1184.7 -9346.7
## + liveMusic                     1    0.5761 1184.8 -9346.3
## <none>                                      1185.4 -9345.5
## + businessStatus                2    0.7579 1184.6 -9345.2
## + dineIn                        1    0.1705 1185.2 -9344.3
## + takeout                       1    0.1492 1185.2 -9344.2
## + servesCocktails               1    0.0682 1185.3 -9343.8
## + acceptsCreditCards            1    0.0236 1185.3 -9343.6
## + wheelchairAccessibleRestroom  1    0.0113 1185.4 -9343.5
## + reservable                    1    0.0103 1185.4 -9343.5
## + priceEndUSD                   1    0.0000 1185.4 -9343.5
## 
## Step:  AIC=-9367.49
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount
## 
##                                Df Sum of Sq    RSS     AIC
## + acceptsCashOnly               1    4.3537 1176.2 -9387.3
## + acceptsDebitCards             1    4.0675 1176.5 -9385.9
## + acceptsNfc                    1    3.8273 1176.7 -9384.6
## + delivery                      1    1.6968 1178.9 -9374.0
## + servesWine                    1    0.9395 1179.6 -9370.2
## + servesLunch                   1    0.8764 1179.7 -9369.9
## + liveMusic                     1    0.6059 1180.0 -9368.5
## + servesBeer                    1    0.5401 1180.0 -9368.2
## + businessStatus                2    0.8085 1179.8 -9367.5
## <none>                                      1180.6 -9367.5
## + wheelchairAccessibleSeating   1    0.3796 1180.2 -9367.4
## + servesCocktails               1    0.2157 1180.3 -9366.6
## + takeout                       1    0.2145 1180.3 -9366.6
## + dineIn                        1    0.1366 1180.4 -9366.2
## + acceptsCreditCards            1    0.0537 1180.5 -9365.8
## + wheelchairAccessibleRestroom  1    0.0209 1180.5 -9365.6
## + reservable                    1    0.0176 1180.5 -9365.6
## + priceEndUSD                   1    0.0091 1180.5 -9365.5
## 
## Step:  AIC=-9387.29
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly
## 
##                                Df Sum of Sq    RSS     AIC
## + delivery                      1   2.14160 1174.1 -9396.0
## + acceptsNfc                    1   2.01750 1174.2 -9395.4
## + acceptsCreditCards            1   1.84518 1174.4 -9394.6
## + servesWine                    1   1.42153 1174.8 -9392.4
## + servesBeer                    1   0.93849 1175.3 -9390.0
## + acceptsDebitCards             1   0.83497 1175.4 -9389.5
## + wheelchairAccessibleSeating   1   0.43651 1175.8 -9387.5
## + businessStatus                2   0.83462 1175.4 -9387.5
## <none>                                      1176.2 -9387.3
## + servesLunch                   1   0.27154 1175.9 -9386.7
## + dineIn                        1   0.21008 1176.0 -9386.3
## + liveMusic                     1   0.10768 1176.1 -9385.8
## + servesCocktails               1   0.05643 1176.2 -9385.6
## + wheelchairAccessibleRestroom  1   0.02987 1176.2 -9385.4
## + takeout                       1   0.01529 1176.2 -9385.4
## + priceEndUSD                   1   0.00575 1176.2 -9385.3
## + reservable                    1   0.00016 1176.2 -9385.3
## 
## Step:  AIC=-9396.04
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery
## 
##                                Df Sum of Sq    RSS     AIC
## + acceptsNfc                    1   2.50451 1171.6 -9406.6
## + acceptsCreditCards            1   1.62176 1172.4 -9402.2
## + servesWine                    1   1.29249 1172.8 -9400.5
## + acceptsDebitCards             1   1.13794 1172.9 -9399.8
## + servesBeer                    1   0.85615 1173.2 -9398.3
## + businessStatus                2   0.80600 1173.3 -9396.1
## <none>                                      1174.1 -9396.0
## + wheelchairAccessibleSeating   1   0.39140 1173.7 -9396.0
## + servesLunch                   1   0.28208 1173.8 -9395.5
## + dineIn                        1   0.15423 1173.9 -9394.8
## + liveMusic                     1   0.11159 1174.0 -9394.6
## + servesCocktails               1   0.08680 1174.0 -9394.5
## + takeout                       1   0.08080 1174.0 -9394.4
## + wheelchairAccessibleRestroom  1   0.04735 1174.0 -9394.3
## + reservable                    1   0.01147 1174.0 -9394.1
## + priceEndUSD                   1   0.00895 1174.1 -9394.1
## 
## Step:  AIC=-9406.64
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc
## 
##                                Df Sum of Sq    RSS     AIC
## + acceptsCreditCards            1   2.01908 1169.5 -9414.8
## + servesWine                    1   1.68301 1169.9 -9413.1
## + servesBeer                    1   1.17121 1170.4 -9410.5
## + wheelchairAccessibleSeating   1   0.54850 1171.0 -9407.4
## <none>                                      1171.6 -9406.6
## + businessStatus                2   0.78309 1170.8 -9406.6
## + acceptsDebitCards             1   0.32557 1171.2 -9406.3
## + servesLunch                   1   0.21456 1171.3 -9405.7
## + dineIn                        1   0.16382 1171.4 -9405.5
## + liveMusic                     1   0.09438 1171.5 -9405.1
## + takeout                       1   0.07136 1171.5 -9405.0
## + wheelchairAccessibleRestroom  1   0.01650 1171.5 -9404.7
## + reservable                    1   0.01603 1171.5 -9404.7
## + servesCocktails               1   0.01463 1171.5 -9404.7
## + priceEndUSD                   1   0.01043 1171.5 -9404.7
## 
## Step:  AIC=-9414.82
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards
## 
##                                Df Sum of Sq    RSS     AIC
## + servesWine                    1   1.13150 1168.4 -9418.5
## + acceptsDebitCards             1   0.85783 1168.7 -9417.2
## + businessStatus                2   1.23411 1168.3 -9417.1
## + servesBeer                    1   0.67700 1168.9 -9416.2
## + wheelchairAccessibleSeating   1   0.48838 1169.0 -9415.3
## <none>                                      1169.5 -9414.8
## + servesLunch                   1   0.23853 1169.3 -9414.0
## + liveMusic                     1   0.19779 1169.3 -9413.8
## + dineIn                        1   0.13657 1169.4 -9413.5
## + priceEndUSD                   1   0.09352 1169.5 -9413.3
## + wheelchairAccessibleRestroom  1   0.08855 1169.5 -9413.3
## + takeout                       1   0.07445 1169.5 -9413.2
## + servesCocktails               1   0.04232 1169.5 -9413.0
## + reservable                    1   0.01516 1169.5 -9412.9
## 
## Step:  AIC=-9418.53
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine
## 
##                                Df Sum of Sq    RSS     AIC
## + servesCocktails               1   1.67620 1166.7 -9425.0
## + businessStatus                2   1.27228 1167.1 -9421.0
## + acceptsDebitCards             1   0.74135 1167.7 -9420.3
## <none>                                      1168.4 -9418.5
## + wheelchairAccessibleSeating   1   0.34625 1168.1 -9418.3
## + liveMusic                     1   0.32111 1168.1 -9418.2
## + servesLunch                   1   0.26972 1168.1 -9417.9
## + dineIn                        1   0.19287 1168.2 -9417.5
## + wheelchairAccessibleRestroom  1   0.12605 1168.3 -9417.2
## + priceEndUSD                   1   0.10302 1168.3 -9417.1
## + takeout                       1   0.08629 1168.3 -9417.0
## + reservable                    1   0.01921 1168.4 -9416.6
## + servesBeer                    1   0.00133 1168.4 -9416.5
## 
## Step:  AIC=-9425.01
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine + servesCocktails
## 
##                                Df Sum of Sq    RSS     AIC
## + businessStatus                2   1.16481 1165.6 -9426.9
## + acceptsDebitCards             1   0.72056 1166.0 -9426.7
## + wheelchairAccessibleSeating   1   0.41604 1166.3 -9425.1
## <none>                                      1166.7 -9425.0
## + servesBeer                    1   0.27919 1166.5 -9424.4
## + servesLunch                   1   0.23816 1166.5 -9424.2
## + liveMusic                     1   0.23198 1166.5 -9424.2
## + dineIn                        1   0.18747 1166.5 -9424.0
## + takeout                       1   0.11841 1166.6 -9423.6
## + priceEndUSD                   1   0.11685 1166.6 -9423.6
## + wheelchairAccessibleRestroom  1   0.10638 1166.6 -9423.5
## + reservable                    1   0.00000 1166.7 -9423.0
## 
## Step:  AIC=-9426.9
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine + servesCocktails + 
##     businessStatus
## 
##                                Df Sum of Sq    RSS     AIC
## + acceptsDebitCards             1   0.78007 1164.8 -9428.9
## + wheelchairAccessibleSeating   1   0.48255 1165.1 -9427.3
## <none>                                      1165.6 -9426.9
## + servesBeer                    1   0.28819 1165.3 -9426.4
## + dineIn                        1   0.23162 1165.3 -9426.1
## + servesLunch                   1   0.20987 1165.4 -9426.0
## + liveMusic                     1   0.18368 1165.4 -9425.8
## + priceEndUSD                   1   0.12254 1165.5 -9425.5
## + takeout                       1   0.11753 1165.5 -9425.5
## + wheelchairAccessibleRestroom  1   0.09447 1165.5 -9425.4
## + reservable                    1   0.00020 1165.6 -9424.9
## 
## Step:  AIC=-9428.85
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine + servesCocktails + 
##     businessStatus + acceptsDebitCards
## 
##                                Df Sum of Sq    RSS     AIC
## + wheelchairAccessibleSeating   1   0.53076 1164.3 -9429.5
## <none>                                      1164.8 -9428.9
## + servesBeer                    1   0.27626 1164.5 -9428.3
## + dineIn                        1   0.25126 1164.5 -9428.1
## + servesLunch                   1   0.20096 1164.6 -9427.9
## + liveMusic                     1   0.19087 1164.6 -9427.8
## + takeout                       1   0.11642 1164.7 -9427.4
## + priceEndUSD                   1   0.11292 1164.7 -9427.4
## + wheelchairAccessibleRestroom  1   0.09205 1164.7 -9427.3
## + reservable                    1   0.00140 1164.8 -9426.9
## 
## Step:  AIC=-9429.54
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine + servesCocktails + 
##     businessStatus + acceptsDebitCards + wheelchairAccessibleSeating
## 
##                                Df Sum of Sq    RSS     AIC
## <none>                                      1164.3 -9429.5
## + wheelchairAccessibleRestroom  1   0.36116 1163.9 -9429.4
## + servesBeer                    1   0.25154 1164.0 -9428.8
## + servesLunch                   1   0.23140 1164.0 -9428.7
## + dineIn                        1   0.21964 1164.0 -9428.7
## + liveMusic                     1   0.19266 1164.1 -9428.5
## + priceEndUSD                   1   0.14054 1164.1 -9428.3
## + takeout                       1   0.11530 1164.1 -9428.1
## + reservable                    1   0.01891 1164.2 -9427.6

4.3.1: View results of forward stepwise regression

step_forward$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## rating ~ 1
## 
## Final Model:
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine + servesCocktails + 
##     businessStatus + acceptsDebitCards + wheelchairAccessibleSeating
## 
## 
##                              Step Df    Deviance Resid. Df Resid. Dev       AIC
## 1                                                     5900   1469.533 -8201.435
## 2                   + primaryType 55 223.3117353      5845   1246.221 -9064.089
## 3                 + priceStartUSD  1  13.1952052      5844   1233.026 -9124.902
## 4  + wheelchairAccessibleEntrance  1   7.8771187      5843   1225.149 -9160.722
## 5               + goodForChildren  1  11.9170991      5842   1213.232 -9216.402
## 6                  + servesDinner  1   8.5150768      5841   1204.717 -9255.964
## 7             + freeStreetParking  1   7.7244907      5840   1196.992 -9291.922
## 8                + curbsidePickup  1   5.8111151      5839   1191.181 -9318.640
## 9                + freeParkingLot  1   5.8029831      5838   1185.378 -9345.458
## 10              + userRatingCount  1   4.8169676      5837   1180.561 -9367.486
## 11              + acceptsCashOnly  1   4.3537149      5836   1176.208 -9387.288
## 12                     + delivery  1   2.1415976      5835   1174.066 -9396.043
## 13                   + acceptsNfc  1   2.5045054      5834   1171.562 -9406.644
## 14           + acceptsCreditCards  1   2.0190782      5833   1169.543 -9414.823
## 15                   + servesWine  1   1.1315030      5832   1168.411 -9418.534
## 16              + servesCocktails  1   1.6762050      5831   1166.735 -9425.006
## 17               + businessStatus  2   1.1648069      5829   1165.570 -9426.900
## 18            + acceptsDebitCards  1   0.7800684      5828   1164.790 -9428.851
## 19  + wheelchairAccessibleSeating  1   0.5307645      5827   1164.259 -9429.540

4.3.2 View final model

summary(step_forward)
## 
## Call:
## lm(formula = rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance + 
##     goodForChildren + servesDinner + freeStreetParking + curbsidePickup + 
##     freeParkingLot + userRatingCount + acceptsCashOnly + delivery + 
##     acceptsNfc + acceptsCreditCards + servesWine + servesCocktails + 
##     businessStatus + acceptsDebitCards + wheelchairAccessibleSeating, 
##     data = RestaurantTraining)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0882 -0.1989  0.0631  0.2787  1.1542 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           4.494e+00  2.625e-01  17.117  < 2e-16 ***
## primaryTypeafghani_restaurant        -3.412e-01  5.019e-01  -0.680 0.496628    
## primaryTypeafrican_restaurant        -2.965e-01  2.753e-01  -1.077 0.281500    
## primaryTypeamerican_restaurant       -4.084e-01  2.264e-01  -1.804 0.071286 .  
## primaryTypeasian_restaurant          -3.614e-01  2.345e-01  -1.541 0.123379    
## primaryTypebagel_shop                -4.752e-01  2.526e-01  -1.882 0.059943 .  
## primaryTypebakery                    -1.873e-01  2.383e-01  -0.786 0.431968    
## primaryTypebar                       -3.160e-01  2.276e-01  -1.389 0.164999    
## primaryTypebar_and_grill             -2.945e-01  2.291e-01  -1.285 0.198682    
## primaryTypebarbecue_restaurant       -6.135e-01  2.333e-01  -2.630 0.008572 ** 
## primaryTypebrazilian_restaurant      -2.874e-01  3.192e-01  -0.900 0.367927    
## primaryTypebreakfast_restaurant      -3.331e-01  2.281e-01  -1.461 0.144176    
## primaryTypebrunch_restaurant         -3.640e-01  2.524e-01  -1.442 0.149335    
## primaryTypebuffet_restaurant         -6.460e-01  2.695e-01  -2.397 0.016557 *  
## primaryTypecafe                      -4.636e-01  2.306e-01  -2.010 0.044448 *  
## primaryTypecafeteria                  2.725e-01  5.005e-01   0.544 0.586172    
## primaryTypechinese_restaurant        -6.036e-01  2.270e-01  -2.659 0.007856 ** 
## primaryTypecoffee_shop               -6.945e-01  2.264e-01  -3.068 0.002167 ** 
## primaryTypedeli                       3.070e-02  2.544e-01   0.121 0.903939    
## primaryTypediner                     -3.350e-01  2.411e-01  -1.389 0.164818    
## primaryTypedonut_shop                -2.390e-01  2.568e-01  -0.931 0.352027    
## primaryTypefast_food_restaurant      -7.408e-01  2.269e-01  -3.265 0.001101 ** 
## primaryTypefine_dining_restaurant    -4.413e-01  3.888e-01  -1.135 0.256330    
## primaryTypefood_court                -3.344e-02  3.878e-01  -0.086 0.931281    
## primaryTypefood_store                -9.907e-02  3.008e-01  -0.329 0.741878    
## primaryTypefrench_restaurant         -2.185e-01  2.669e-01  -0.819 0.413038    
## primaryTypegreek_restaurant          -3.113e-01  2.444e-01  -1.274 0.202862    
## primaryTypehamburger_restaurant      -5.091e-01  2.323e-01  -2.192 0.028450 *  
## primaryTypeindian_restaurant         -4.196e-01  2.327e-01  -1.803 0.071443 .  
## primaryTypeitalian_restaurant        -2.915e-01  2.286e-01  -1.275 0.202399    
## primaryTypejapanese_restaurant       -2.942e-01  2.336e-01  -1.259 0.207950    
## primaryTypejuice_shop                -3.197e-01  2.387e-01  -1.339 0.180505    
## primaryTypekorean_restaurant         -2.172e-01  2.376e-01  -0.914 0.360550    
## primaryTypelebanese_restaurant        1.740e-02  3.893e-01   0.045 0.964358    
## primaryTypemeal_delivery             -9.287e-01  2.333e-01  -3.981 6.95e-05 ***
## primaryTypemeal_takeaway             -2.603e-01  2.403e-01  -1.083 0.278708    
## primaryTypemediterranean_restaurant  -2.420e-01  2.333e-01  -1.037 0.299630    
## primaryTypemexican_restaurant        -4.065e-01  2.257e-01  -1.801 0.071815 .  
## primaryTypemiddle_eastern_restaurant -1.891e-01  2.364e-01  -0.800 0.423790    
## primaryTypenight_club                -4.536e-01  5.001e-01  -0.907 0.364475    
## primaryTypepizza_restaurant          -5.063e-01  2.264e-01  -2.236 0.025385 *  
## primaryTypepub                       -2.426e-01  2.347e-01  -1.034 0.301209    
## primaryTyperamen_restaurant          -2.203e-01  2.415e-01  -0.912 0.361756    
## primaryTyperestaurant                -4.177e-01  2.253e-01  -1.854 0.063816 .  
## primaryTypesandwich_shop             -6.977e-01  2.264e-01  -3.081 0.002072 ** 
## primaryTypeseafood_restaurant        -4.600e-01  2.310e-01  -1.991 0.046502 *  
## primaryTypespanish_restaurant        -3.158e-01  2.908e-01  -1.086 0.277446    
## primaryTypesteak_house               -3.549e-01  2.451e-01  -1.448 0.147662    
## primaryTypesushi_restaurant          -1.443e-01  2.321e-01  -0.622 0.534263    
## primaryTypetea_house                 -1.208e-01  3.170e-01  -0.381 0.703166    
## primaryTypethai_restaurant           -2.643e-01  2.301e-01  -1.149 0.250687    
## primaryTypeturkish_restaurant        -5.864e-02  2.701e-01  -0.217 0.828136    
## primaryTypevegan_restaurant          -1.334e-01  2.453e-01  -0.544 0.586612    
## primaryTypevegetarian_restaurant     -5.136e-01  3.426e-01  -1.499 0.133945    
## primaryTypevietnamese_restaurant     -1.776e-01  2.376e-01  -0.747 0.454799    
## primaryTypewine_bar                  -2.846e-01  3.005e-01  -0.947 0.343700    
## priceStartUSD                         5.758e-03  8.016e-04   7.183 7.65e-13 ***
## wheelchairAccessibleEntranceTRUE     -9.989e-02  1.691e-02  -5.909 3.64e-09 ***
## goodForChildrenTRUE                   1.375e-01  1.791e-02   7.677 1.89e-14 ***
## servesDinnerTRUE                     -1.309e-01  2.476e-02  -5.286 1.30e-07 ***
## freeStreetParkingTRUE                 1.058e-01  1.354e-02   7.808 6.84e-15 ***
## curbsidePickupTRUE                    6.189e-02  1.219e-02   5.078 3.93e-07 ***
## freeParkingLotTRUE                   -7.734e-02  1.375e-02  -5.625 1.94e-08 ***
## userRatingCount                       3.000e-05  5.832e-06   5.145 2.77e-07 ***
## acceptsCashOnlyTRUE                  -1.610e-01  5.200e-02  -3.096 0.001969 ** 
## deliveryTRUE                          8.310e-02  2.377e-02   3.495 0.000477 ***
## acceptsNfcTRUE                       -5.816e-02  1.918e-02  -3.032 0.002443 ** 
## acceptsCreditCardsTRUE                8.808e-02  2.763e-02   3.188 0.001441 ** 
## servesWineTRUE                        8.258e-02  2.368e-02   3.487 0.000492 ***
## servesCocktailsTRUE                  -6.878e-02  2.419e-02  -2.844 0.004476 ** 
## businessStatusCLOSED_TEMPORARILY      2.616e-01  1.343e-01   1.948 0.051425 .  
## businessStatusOPERATIONAL             1.873e-01  1.300e-01   1.441 0.149657    
## acceptsDebitCardsTRUE                -7.897e-02  3.879e-02  -2.036 0.041791 *  
## wheelchairAccessibleSeatingTRUE       2.594e-02  1.592e-02   1.630 0.103186    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.447 on 5827 degrees of freedom
## Multiple R-squared:  0.2077, Adjusted R-squared:  0.1978 
## F-statistic: 20.93 on 73 and 5827 DF,  p-value: < 2.2e-16

Conclusion

  • Residual Standard Error (RSE) is 0.447 on 5827 degrees of freedom.
  • Multiple R-squared (\(R^2\)) is 0.2077 The model explains 20.77% of the variance in rating.
  • Adjusted R-squared (\(R^2\)) is 0.1978. This is a better measure for comparison, as it penalizes models for including irrelevant variables. If the Adjusted \(R^2\) is much lower than \(R^2\), it confirms that many of your predictors (likely the numerous primaryType dummy variables) are not useful and the model is slightly overfit.
  • Use the step_forward model to predict rating in RestaurantTest and calculate MAE and MSE.
# Predicting rating
ypred_forward = predict(object=step_forward, newdata = RestaurantTest)

# mean absolute error of predicted rating, and Actual rating
MAE(y_pred = ypred_forward, y_true = RestaurantTest$rating)
## [1] 0.3191636
# mean square error of predicted rating, and Actual rating
MSE(y_pred = ypred_forward, y_true = RestaurantTest$rating)
## [1] 0.1854483